Goto

Collaborating Authors

 distributional reward decomposition


Distributional Reward Decomposition for Reinforcement Learning

Neural Information Processing Systems

Many reinforcement learning (RL) tasks have specific properties that can be leveraged to modify existing RL algorithms to adapt to those tasks and further improve performance, and a general class of such properties is the multiple reward channel. In those environments the full reward can be decomposed into sub-rewards obtained from different channels. Existing work on reward decomposition either requires prior knowledge of the environment to decompose the full reward, or decomposes reward without prior knowledge but with degraded performance. In this paper, we propose Distributional Reward Decomposition for Reinforcement Learning (DRDRL), a novel reward decomposition algorithm which captures the multiple reward channel structure under distributional setting. Empirically, our method captures the multi-channel structure and discovers meaningful reward decomposition, without any requirements on prior knowledge. Consequently, our agent achieves better performance than existing methods on environments with multiple reward channels.



Reviews: Distributional Reward Decomposition for Reinforcement Learning

Neural Information Processing Systems

The submission introduces a method for distributional reward decomposition which is more generally applicable than prior work, removing requirements for arbitrary resets as well as domain knowledge. To further strengthen disentanglement the objective is extended to maximise the KL divergence between the distributions resulting from actions optimising for different subrewards (treating the learned Q functions as epsilon greedy policies). Overall, the work provides a valuable contribution to RL by investigating (and benefitting from) reward decomposition in a distributional setting. The combination of reward decomposition and distributional RL provides novelty and as demonstrated in the experimental section better agent performance by exploiting task structure. It would be interesting in this context to see how the approach fares in tasks with only a single source of reward and potential situations where the method might perform worse than the baseline.


Reviews: Distributional Reward Decomposition for Reinforcement Learning

Neural Information Processing Systems

The reviewers enjoyed the paper, although expressed some concerns regarding the novelty (it combines a number of existing ideas). Still, the combination does result in a clear performance increase on a small set of Atari 2600 games. In the discussion the reviewers appreciated the additional experiments provided in the rebuttal, and reiterated the need for the final version of this paper to incorporate these and to be cleaned up. I also want to encourage the authors to report the performance of their algorithm on a larger number of Atari 2600 games -- in particular, how were these 6 games selected? Was there an unconscious bias in this selection?


Distributional Reward Decomposition for Reinforcement Learning

Neural Information Processing Systems

Many reinforcement learning (RL) tasks have specific properties that can be leveraged to modify existing RL algorithms to adapt to those tasks and further improve performance, and a general class of such properties is the multiple reward channel. In those environments the full reward can be decomposed into sub-rewards obtained from different channels. Existing work on reward decomposition either requires prior knowledge of the environment to decompose the full reward, or decomposes reward without prior knowledge but with degraded performance. In this paper, we propose Distributional Reward Decomposition for Reinforcement Learning (DRDRL), a novel reward decomposition algorithm which captures the multiple reward channel structure under distributional setting. Empirically, our method captures the multi-channel structure and discovers meaningful reward decomposition, without any requirements on prior knowledge.


Distributional Reward Decomposition for Reinforcement Learning

Neural Information Processing Systems

Many reinforcement learning (RL) tasks have specific properties that can be leveraged to modify existing RL algorithms to adapt to those tasks and further improve performance, and a general class of such properties is the multiple reward channel. In those environments the full reward can be decomposed into sub-rewards obtained from different channels. Existing work on reward decomposition either requires prior knowledge of the environment to decompose the full reward, or decomposes reward without prior knowledge but with degraded performance. In this paper, we propose Distributional Reward Decomposition for Reinforcement Learning (DRDRL), a novel reward decomposition algorithm which captures the multiple reward channel structure under distributional setting. Empirically, our method captures the multi-channel structure and discovers meaningful reward decomposition, without any requirements on prior knowledge.


Distributional Reward Decomposition for Reinforcement Learning

arXiv.org Artificial Intelligence

Many reinforcement learning (RL) tasks have specific properties that can be leveraged to modify existing RL algorithms to adapt to those tasks and further improve performance, and a general class of such properties is the multiple reward channel. In those environments the full reward can be decomposed into sub-rewards obtained from different channels. Existing work on reward decomposition either requires prior knowledge of the environment to decompose the full reward, or decomposes reward without prior knowledge but with degraded performance. In this paper, we propose Distributional Reward Decomposition for Reinforcement Learning (DRDRL), a novel reward decomposition algorithm which captures the multiple reward channel structure under distributional setting. Empirically, our method captures the multi-channel structure and discovers meaningful reward decomposition, without any requirements on prior knowledge. Consequently, our agent achieves better performance than existing methods on environments with multiple reward channels.